The first example uses the program at the bottom of page 315 (with the
ADDD replaced by MULTD). The program is shown below.
The simulator is invoked by typing dlxsim at the system prompt.
% dlxsim
First the datafile is loaded, using the load command:
load fdata.s
Next, the program may be loaded. The program above was created with an
editor and saved in the file f1.s. It is loaded in the same way
as the datafile.
load f1.s
To verify that the program has been loaded, the get command can
be used to examine memory. The program is loaded at location 256 by
default. The second parameter to get indicates how many words
to dump. The i suffix tells get to dump the contents in
instruction format (i.e. produce a disassembly).
get 256 9i
start: ld f2,a(r0) start+0x4: addi r1,r0,0xe0 loop: ld f0,a(r1) loop+0x4: multd f4,f0,f2 loop+0x8: sd a(r1),f4 loop+0xc: subi r1,r1,0x8 loop+0x10: bnez r1,loop loop+0x14: nop loop+0x18: trap 0x0
To make sure that the statistics are all cleared (as they should be when
is first invoked), use the stats command with the relevant
parameters:
stats stalls branch pending hw
Memory size: 65536 bytes. Floating Point Hardware Configuration 1 add/subtract units, latency = 2 cycles 1 divide units, latency = 19 cycles 1 multiply units, latency = 5 cycles Load Stalls = 0 Floating Point Stalls = 0 No branch instructions executed. Pending Floating Point Operations: none.
The hw specifier causes the memory size and floating point hardware
information to be dumped. The stalls specifier causes the total
load stalls and floating point stalls to be displayed. The branch
specifier causes the branch information (taken vs. not taken) to be displayed;
in this case no branches have been executed yet. Finally, the pending
specifier causes the pending operations in the floating point units to
be displayed (none in this case).
Below, the first four instructions are executed using the step
command:
step 256
stopped after single step, pc = start+0x4: addi r1,r0,0xe0
step
stopped after single step, pc = loop: ld f0,a(r1)
step
stopped after single step, pc = loop+0x4: multd f4,f0,f2
step
stopped after single step, pc = loop+0x8: sd a(r1),f4
The stats command can produce some more interesting results
at this point.
stats stalls pending
Load Stalls = 1 Floating Point Stalls = 0 Pending Floating Point Operations: multiplier #1 : will complete in 4 more cycle(s) 87.964594 ==> F4:F5
A load stall occurred between the third and fourth instructions because of the F0 dependency. The multiply instruction has issued, and is being processed in multiplier unit #1. It will complete and store the double precision value 87.96 into F4 and F5 in four more clock cycles.
The double precision value in F4 can be displayed using the fget
command with a d specifier (for double precision).
fget f4 d
f4: 0.000000
As expected, F4 hasn't received its value yet. Executing one more instruction
will change the statistics:
step
stopped after single step, pc = loop+0xc: subi r1,r1,0x8
stats stalls pending
Load Stalls = 1 Floating Point Stalls = 4 Pending Floating Point Operations: none.
Since the SD instruction used the result from the multiply instruction,
the multiply was completed before the SD was executed. The four floating
point stalls required for the multiply to complete were recorded as well.
If F4 is examined now, its value after the writeback is displayed.
fget f4 d
f4: 87.964594
To execute the program to completion, the go command can be used.
When the TRAP instruction is detected, the simulation will stop.
go
TRAP #0 received
To view the cumulative stall and branch information, the stats command
can be used.
stats stalls branch
Load Stalls = 28 Floating Point Stalls = 112 Branches: total 28, taken 27 (96.43%), untaken 1 (3.57%)
The loop executed 28 times. There was a single load stall per iteration, for a total of 28 load stalls. There were 4 floating point stalls per iteration, for a total of 112 floating point stalls. Finally, the conditional branch at the bottom of the loop was taken 27 times, and fell through on the final time. All these statistics are reflected above.
To verify the program operated properly, the memory locations containing
the original data can be examined with the fget command. The original
data was stored in the 28 double words beginning at location 8.
fget 8 28d
x: 3.141593 x+0x8: 6.283185 x+0x10: 9.424778 ... etc. ... x+0xc8: 81.681409 x+0xd0: 84.823002 xtop: 87.964594
As expected, the initial integer values have all been multiplied by π.